X射线微型计算机断层扫描(Micro-CT)已被广泛利用,以在地下多孔岩石中表征孔隙尺度几何形状。使用深度学习的超分辨率(SR)方法的最新进程允许在大型空间尺度上进行数字增强低分辨率(LR)图像,从而创建与高分辨率(HR)地理真理相当的SR图像。这避免了传统的解决方案和视野折衷。出色的问题是使用配对(已注册的)LR和HR数据,这些数据通常需要在此类方法的训练步骤中,但难以获得。在这项工作中,我们严格比较两种不同的最先进的SR深度学习技术,使用两者和未配对数据,具有类似于类似的地面真理数据。第一方法需要配对的图像来训练卷积神经网络(CNN),而第二种方法使用未配对的图像来训练生成的对抗网络(GaN)。使用具有复杂的微孔纹理的微型CT碳酸盐岩样品进行比较两种方法。我们实现了基于图像的各种图像和数值验证和实验验证,以定量评估两种方法的物理精度和敏感性。我们的定量结果表明,未配对GaN方法可以将超分辨率图像重建为精确,如配对的CNN方法,具有可比的训练时间和数据集要求。这将使用未配对的深度学习方法解除微型CT图像增强的新应用;数据处理阶段不再需要图像注册。来自数据存储平台的解耦图像可以更有效地利用用于培训SR数字岩体应用的网络。这为异构多孔介质中的多尺度流模拟各种应用开辟了新的途径。
translated by 谷歌翻译
X射线微型计算机断层摄影成像中存在固有的视野和分辨率折衷,这限制了多尺寸多孔系统的表征,分析和模型开发。在本文中,我们通过开发3D增强的深层超分辨率(EDSR)卷积神经网络来克服这些权衡来通过来自低分辨率数据的大型空间尺度创建增强的高分辨率数据。配对高分辨率(HR,2 $ \ MU $ M)和低分辨率(LR,6 $ \ MU $ M)来自Bentheimer Rock样本的图像数据用于培训网络。来自训练样本的未见LR和HR数据以及具有不同微结构的另一个样本,用于验证具有各种度量的网络:文本分析,分段行为和孔网络模型(PNM)多相流模拟。经过验证的EDSR网络用于为每个长度为6-7厘米的全核样品生成约1000个高分辨率转速子图像(总图像大小为约6000x6000x32000体素)。每个子培养物都具有从PNMS预测的不同的岩石物理特性,它们组合以创建每个样本的3D连续级模型。在一系列分数流动下模拟低毛细管数不混溶的流动,并直接在1:1的基础上与实验压力和3D饱和度进行比较。 EDSR产生的模型比在存在异质性存在下预测实验行为的基础LR模型更准确,特别是在遇到孔隙尺寸的广泛分布的流动状态下。该模型通常在预测到在实验重复性和三个数量级的实验重复性和相对渗透率内的饱和度准确。所示的工作流程是一个完全预测的,无需校准,并且打开了在真正的多尺度异构系统中的图像,模拟和分析流动的可能性。
translated by 谷歌翻译
Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32% top-1 accuracy on ImageNet and 75.2% on CIFAR-100, respectively.
translated by 谷歌翻译
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and quantify the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analysis are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices.
translated by 谷歌翻译
Optical flow, which computes the apparent motion from a pair of video frames, is a critical tool for scene motion estimation. Correlation volume is the central component of optical flow computational neural models. It estimates the pairwise matching costs between cross-frame features, and is then used to decode optical flow. However, traditional correlation volume is frequently noisy, outlier-prone, and sensitive to motion blur. We observe that, although the recent RAFT algorithm also adopts the traditional correlation volume, its additional context encoder provides semantically representative features to the flow decoder, implicitly compensating for the deficiency of the correlation volume. However, the benefits of this context encoder has been barely discussed or exploited. In this paper, we first investigate the functionality of RAFT's context encoder, then propose a new Context Guided Correlation Volume (CGCV) via gating and lifting schemes. CGCV can be universally integrated with RAFT-based flow computation methods for enhanced performance, especially effective in the presence of motion blur, de-focus blur and atmospheric effects. By incorporating the proposed CGCV with previous Global Motion Aggregation (GMA) method, at a minor cost of 0.5% extra parameters, the rank of GMA is lifted by 23 places on KITTI 2015 Leader Board, and 3 places on Sintel Leader Board. Moreover, at a similar model size, our correlation volume achieves competitive or superior performance to state of the art peer supervised models that employ Transformers or Graph Reasoning, as verified by extensive experiments.
translated by 谷歌翻译
Image harmonization aims to produce visually harmonious composite images by adjusting the foreground appearance to be compatible with the background. When the composite image has photographic foreground and painterly background, the task is called painterly image harmonization. There are only few works on this task, which are either time-consuming or weak in generating well-harmonized results. In this work, we propose a novel painterly harmonization network consisting of a dual-domain generator and a dual-domain discriminator, which harmonizes the composite image in both spatial domain and frequency domain. The dual-domain generator performs harmonization by using AdaIn modules in the spatial domain and our proposed ResFFT modules in the frequency domain. The dual-domain discriminator attempts to distinguish the inharmonious patches based on the spatial feature and frequency feature of each patch, which can enhance the ability of generator in an adversarial manner. Extensive experiments on the benchmark dataset show the effectiveness of our method. Our code and model are available at https://github.com/bcmi/PHDNet-Painterly-Image-Harmonization.
translated by 谷歌翻译
Automatic defect detection for 3D printing processes, which shares many characteristics with change detection problems, is a vital step for quality control of 3D printed products. However, there are some critical challenges in the current state of practice. First, existing methods for computer vision-based process monitoring typically work well only under specific camera viewpoints and lighting situations, requiring expensive pre-processing, alignment, and camera setups. Second, many defect detection techniques are specific to pre-defined defect patterns and/or print schematics. In this work, we approach the automatic defect detection problem differently using a novel Semi-Siamese deep learning model that directly compares a reference schematic of the desired print and a camera image of the achieved print. The model then solves an image segmentation problem, identifying the locations of defects with respect to the reference frame. Unlike most change detection problems, our model is specially developed to handle images coming from different domains and is robust against perturbations in the imaging setup such as camera angle and illumination. Defect localization predictions were made in 2.75 seconds per layer using a standard MacBookPro, which is comparable to the typical tens of seconds or less for printing a single layer on an inkjet-based 3D printer, while achieving an F1-score of more than 0.9.
translated by 谷歌翻译
As a neural network compression technique, post-training quantization (PTQ) transforms a pre-trained model into a quantized model using a lower-precision data type. However, the prediction accuracy will decrease because of the quantization noise, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Many existing methods determine the quantization parameters by minimizing the distance between features before and after quantization. Using this distance as the metric to optimize the quantization parameters only considers local information. We analyze the problem of minimizing local metrics and indicate that it would not result in optimal quantization parameters. Furthermore, the quantized model suffers from overfitting due to the small number of calibration samples in PTQ. In this paper, we propose PD-Quant to solve the problems. PD-Quant uses the information of differences between network prediction before and after quantization to determine the quantization parameters. To mitigate the overfitting problem, PD-Quant adjusts the distribution of activations in PTQ. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.08% and RegNetX-600MF up to 40.92% in weight 2-bit activation 2-bit. The code will be released at https://github.com/hustvl/PD-Quant.
translated by 谷歌翻译
In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks. Cross Entropy (CE) loss has been shown to be not robust to noisy labels due to its unboundedness. To alleviate this issue, existing works typically design specialized robust losses with the symmetric condition, which usually lead to the underfitting issue. In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses. Specifically, we propose logit clipping (LogitClip), which clamps the norm of the logit vector to ensure that it is upper bounded by a constant. In this manner, CE loss equipped with our LogitClip method is effectively bounded, mitigating the overfitting to examples with noisy labels. Moreover, we present theoretical analyses to certify the noise-tolerant ability of LogitClip. Extensive experiments show that LogitClip not only significantly improves the noise robustness of CE loss, but also broadly enhances the generalization performance of popular robust losses.
translated by 谷歌翻译
Federated learning (FL) is a promising approach to enable the future Internet of vehicles consisting of intelligent connected vehicles (ICVs) with powerful sensing, computing and communication capabilities. We consider a base station (BS) coordinating nearby ICVs to train a neural network in a collaborative yet distributed manner, in order to limit data traffic and privacy leakage. However, due to the mobility of vehicles, the connections between the BS and ICVs are short-lived, which affects the resource utilization of ICVs, and thus, the convergence speed of the training process. In this paper, we propose an accelerated FL-ICV framework, by optimizing the duration of each training round and the number of local iterations, for better convergence performance of FL. We propose a mobility-aware optimization algorithm called MOB-FL, which aims at maximizing the resource utilization of ICVs under short-lived wireless connections, so as to increase the convergence speed. Simulation results based on the beam selection and the trajectory prediction tasks verify the effectiveness of the proposed solution.
translated by 谷歌翻译